Goto

Collaborating Authors

 energy efficiency


Spiking Token Mixer: An event-driven friendly Former structure for spiking neural networks

Neural Information Processing Systems

Spiking neural networks (SNNs), inspired by biological processes, use spike signals for inter-layer communication, presenting an energy-efficient alternative to traditional neural networks. To realize the theoretical advantages of SNNs in energy efficiency, it is essential to deploy them onto neuromorphic chips. On clock-driven synchronous chips, employing shorter time steps can enhance energy efficiency but reduce SNN performance. Compared to the clock-driven synchronous chip, the event-driven asynchronous chip achieves much lower energy consumption but only supports some specific network operations. Recently, a series of SNN projects have achieved tremendous success, significantly improving the SNN's performance. However, event-driven asynchronous chips do not support some of the proposed structures, making it impossible to integrate these SNNs into asynchronous hardware. In response to these problems, we propose the Spiking Token Mixer (STMixer) architecture, which consists exclusively of operations supported by asynchronous scenarios, including convolutional, fully connected layers and residual paths. Our series of experiments also demonstrates that STMixer achieves performance on par with spiking transformers in synchronous scenarios with very low timesteps. This indicates its ability to achieve the same level of performance with lower power consumption in synchronous scenarios.


Spiking Graph Neural Network on Riemannian Manifolds

Neural Information Processing Systems

Graph neural networks (GNNs) have become the dominant solution for learning on graphs, the typical non-Euclidean structures. Conventional GNNs, constructed with the Artificial Neuron Network (ANN), have achieved impressive performance at the cost of high computation and energy consumption. In parallel, spiking GNNs with brain-like spiking neurons are drawing increasing research attention owing to the energy efficiency. So far, existing spiking GNNs consider graphs in Euclidean space, ignoring the structural geometry, and suffer from the high latency issue due to Back-Propagation-Through-Time (BPTT) with the surrogate gradient. In light of the aforementioned issues, we are devoted to exploring spiking GNN on Riemannian manifolds, and present a Manifold-valued Spiking GNN (MSG). In particular, we design a new spiking neuron on geodesically complete manifolds with the diffeomorphism, so that BPTT regarding the spikes is replaced by the proposed differentiation via manifold. Theoretically, we show that MSG approximates a solver of the manifold ordinary differential equation. Extensive experiments on common graphs show the proposed MSG achieves superior performance to previous spiking GNNs and energy efficiency to conventional GNNs.


Performance Measurements in the AI-Centric Computing Continuum Systems

Donta, Praveen Kumar, Zhang, Qiyang, Dustdar, Schahram

arXiv.org Artificial Intelligence

Over the Eight decades, computing paradigms have shifted from large, centralized systems to compact, distributed architectures, leading to the rise of the Distributed Computing Continuum (DCC). In this model, multiple layers such as cloud, edge, Internet of Things (IoT), and mobile platforms work together to support a wide range of applications. Recently, the emergence of Generative AI and large language models has further intensified the demand for computational resources across this continuum. Although traditional performance metrics have provided a solid foundation, they need to be revisited and expanded to keep pace with changing computational demands and application requirements. Accurate performance measurements benefit both system designers and users by supporting improvements in efficiency and promoting alignment with system goals. In this context, we review commonly used metrics in DCC and IoT environments. We also discuss emerging performance dimensions that address evolving computing needs, such as sustainability, energy efficiency, and system observability. We also outline criteria and considerations for selecting appropriate metrics, aiming to inspire future research and development in this critical area.


Edge Deployment of Small Language Models, a comprehensive comparison of CPU, GPU and NPU backends

Prieto, Pablo, Abad, Pablo

arXiv.org Artificial Intelligence

Edge computing processes data where it is generated, enabling faster decisions, lower bandwidth usage, and improved privacy. However, edge devices typically operate under strict constraints on processing power, memory, and energy consumption, making them unsuitable for large language models (LLMs). Fortunately, Small Language Models (SLMs) offer lightweight alternatives that bring AI inference to resource-constrained environments by significantly reducing computational cost while remaining suitable for specialization and customization. In this scenario, selecting the hardware platform that best balances performance and efficiency for SLM inference is challenging due to strict resource limitations. To address this issue, this study evaluates the inference performance and energy efficiency of commercial CPUs (Intel and ARM), GPUs (NVIDIA), and NPUs (RaiderChip) for running SLMs. GPUs, the usual platform of choice, are compared against commercial NPUs and recent multi-core CPUs. While NPUs leverage custom hardware designs optimized for computation, modern CPUs increasingly incorporate dedicated features targeting language-model workloads. Using a common execution framework and a suite of state-of-the-art SLMs, we analyze both maximum achievable performance and processing and energy efficiency across commercial solutions available for each platform. The results indicate that specialized backends outperform general-purpose CPUs, with NPUs achieving the highest performance by a wide margin. Bandwidth normalization proves essential for fair cross-architecture comparisons. Although low-power ARM processors deliver competitive results when energy usage is considered, metrics that combine performance and power (such as EDP) again highlight NPUs as the dominant architecture. These findings show that designs optimized for both efficiency and performance offer a clear advantage for edge workloads.


Khalasi: Energy-Efficient Navigation for Surface Vehicles in Vortical Flow Fields

Gadhvi, Rushiraj, Manjanna, Sandeep

arXiv.org Artificial Intelligence

For centuries, khalasi (Gujarati for sailor) have skillfully harnessed ocean currents to navigate vast waters with minimal effort. Emulating this intuition in autonomous systems remains a significant challenge, particularly for Autonomous Surface Vehicles tasked with long duration missions under strict energy budgets. In this work, we present a learning-based approach for energy-efficient surface vehicle navigation in vortical flow fields, where partial observability often undermines traditional path-planning methods. We present an end to end reinforcement learning framework based on Soft Actor Critic that learns flow-aware navigation policies using only local velocity measurements. Through extensive evaluation across diverse and dynamically rich scenarios, our method demonstrates substantial energy savings and robust generalization to previously unseen flow conditions, offering a promising path toward long term autonomy in ocean environments. The navigation paths generated by our proposed approach show an improvement in energy conservation 30 to 50 percent compared to the existing state of the art techniques.


HH-PIM: Dynamic Optimization of Power and Performance with Heterogeneous-Hybrid PIM for Edge AI Devices

Jeon, Sangmin, Lee, Kangju, Lee, Kyeongwon, Lee, Woojoo

arXiv.org Artificial Intelligence

--Processing-in-Memory (PIM) architectures offer promising solutions for efficiently handling AI applications in energy-constrained edge environments. While traditional PIM designs enhance performance and energy efficiency by reducing data movement between memory and processing units, they are limited in edge devices due to continuous power demands and the storage requirements of large neural network weights in SRAM and DRAM. Hybrid PIM architectures, incorporating nonvolatile memories like MRAM and ReRAM, mitigate these limitations but struggle with a mismatch between fixed computing resources and dynamically changing inference workloads. T o address these challenges, this study introduces a Heterogeneous-Hybrid PIM ( HH-PIM) architecture, comprising high-performance MRAM-SRAM PIM modules and low-power MRAM-SRAM PIM modules. We further propose a data placement optimization algorithm that dynamically allocates data based on computational demand, maximizing energy efficiency. FPGA prototyping and power simulations with processors featuring HH-PIM and other PIM types demonstrate that the proposed HH-PIM achieves up to 60.43% average energy savings over conventional PIMs while meeting application latency requirements. These results confirm HH-PIM's suitability for adaptive, energy-efficient AI processing in edge devices. With the advent of artificial intelligence (AI), real-world applications are rapidly expanding, fueling a trend to embed AI capabilities into IoT devices across diverse fields. However, traditional server-centric data processing, such as cloud computing, faces significant energy and latency challenges due to processing and communication overloads.


A Trustworthy By Design Classification Model for Building Energy Retrofit Decision Support

Rempi, Panagiota, Pelekis, Sotiris, Tzortzis, Alexandros Menelaos, Spiliotis, Evangelos, Karakolis, Evangelos, Ntanos, Christos, Askounis, Dimitris

arXiv.org Artificial Intelligence

Improving energy efficiency in residential buildings is critical to combating climate change and reducing greenhouse gas emissions. Retrofitting existing buildings, which contribute a significant share of energy use, is therefore a key priority, especially in regions with outdated building stock. Artificial Intelligence (AI) and Machine Learning (ML) can automate retrofit decision-making and find retrofit strategies. However, their use faces challenges of data availability, model transparency, and compliance with national and EU AI regulations including the AI act, ethics guidelines and the ALTAI. This paper presents a trustworthy-by-design ML-based decision support framework that recommends energy efficiency strategies for residential buildings using minimal user-accessible inputs. The framework merges Conditional Tabular Generative Adversarial Networks (CTGAN) to augment limited and imbalanced data with a neural network-based multi-label classifier that predicts potential combinations of retrofit actions. To support explanation and trustworthiness, an Explainable AI (XAI) layer using SHapley Additive exPlanations (SHAP) clarifies the rationale behind recommendations and guides feature engineering. Two case studies validate performance and generalization: the first leveraging a well-established, large EPC dataset for England and Wales; the second using a small, imbalanced post-retrofit dataset from Latvia (RETROFIT-LAT). Results show that the framework can handle diverse data conditions and improve performance up to 53% compared to the baseline. Overall, the proposed framework provides a feasible, interpretable, and trustworthy AI system for building retrofit decision support through assured performance, usability, and transparency to aid stakeholders in prioritizing effective energy investments and support regulation-compliant, data-driven innovation in sustainable energy transition.


Hardware-Software Collaborative Computing of Photonic Spiking Reinforcement Learning for Robotic Continuous Control

Yu, Mengting, Xiang, Shuiying, Xie, Changjian, Chen, Yonghang, Zhao, Haowen, Guo, Xingxing, Zhang, Yahui, Han, Yanan, Hao, Yue

arXiv.org Artificial Intelligence

Robotic continuous control tasks impose stringent demands on the energy efficiency and latency of computing architectures due to their high-dimensional state spaces and real-time interaction requirements. Conventional electronic computing platforms face computational bottlenecks, whereas the fusion of photonic computing and spiking reinforcement learning (RL) offers a promising alternative. Here, we propose a novel computing architecture based on photonic spiking RL, which integrates the Twin Delayed Deep Deterministic policy gradient (TD3) algorithm with spiking neural network (SNN). The proposed architecture employs an optical-electronic hybrid computing paradigm wherein a silicon photonic Mach-Zehnder interferometer (MZI) chip executes linear matrix computations, while nonlinear spiking activations are performed in the electronic domain. Experimental validation on the Pendulum-v1 and HalfCheetah-v2 benchmarks demonstrates the system capability for software-hardware co-inference, achieving a control policy reward of 5831 on HalfCheetah-v2, a 23.33% reduction in convergence steps, and an action deviation below 2.2%. Notably, this work represents the first application of a programmable MZI photonic computing chip to robotic continuous control tasks, attaining an energy efficiency of 1.39 TOPS/W and an ultralow computational latency of 120 ps. Such performance underscores the promise of photonic spiking RL for real-time decision-making in autonomous and industrial robotic systems.


MS-PPO: Morphological-Symmetry-Equivariant Policy for Legged Robot Locomotion

Wei, Sizhe, Chen, Xulin, Xie, Fengze, Katz, Garrett Ethan, Gan, Zhenyu, Gan, Lu

arXiv.org Artificial Intelligence

Reinforcement learning has recently enabled impressive locomotion capabilities on legged robots; however, most policy architectures remain morphology- and symmetry-agnostic, leading to inefficient training and limited generalization. This work introduces MS-PPO, a morphological-symmetry-equivariant policy learning framework that encodes robot kinematic structure and morphological symmetries directly into the policy network. We construct a morphology-informed graph neural architecture that is provably equivariant with respect to the robot's morphological symmetry group actions, ensuring consistent policy responses under symmetric states while maintaining invariance in value estimation. This design eliminates the need for tedious reward shaping or costly data augmentation, which are typically required to enforce symmetry. We evaluate MS-PPO in simulation on Unitree Go2 and Xiaomi CyberDog2 robots across diverse locomotion tasks, including trotting, pronking, slope walking, and bipedal turning, and further deploy the learned policies on hardware. Extensive experiments show that MS-PPO achieves superior training stability, symmetry generalization ability, and sample efficiency in challenging locomotion tasks, compared to state-of-the-art baselines. These findings demonstrate that embedding both kinematic structure and morphological symmetry into policy learning provides a powerful inductive bias for legged robot locomotion control. Our code will be made publicly available at https://lunarlab-gatech.github.io/MS-PPO/.


A Cross-Embodiment Gripper Benchmark for Rigid-Object Manipulation in Aerial and Industrial Robotics

Vagas, Marek, Varga, Martin, Romancik, Jaroslav, Majercak, Ondrej, Suarez, Alejandro, Ollero, Anibal, Vanderborght, Bram, Virgala, Ivan

arXiv.org Artificial Intelligence

Abstract--Robotic grippers are increasingly deployed across industrial, collaborative, and aerial platforms, where each embodiment imposes distinct mechanical, energetic, and operational constraints. Established YCB and NIST benchmarks quantify grasp success, force, or timing on a single platform, but do not evaluate cross-embodiment transferability or energy-aware performance, capabilities essential for modern mobile and aerial manipulation. This letter introduces the Cross-Embodiment Gripper Benchmark (CEGB), a compact and reproducible benchmarking suite extending YCB and selected NIST metrics with three additional components: a transfer-time benchmark measuring the practical effort required to exchange embodiments, an energy-consumption benchmark evaluating grasping and holding efficiency, and an intent-specific ideal payload assessment reflecting design-dependent operational capability. T ogether, these metrics characterize both grasp performance and the suitability of reusing a single gripper across heterogeneous robotic systems. A lightweight self-locking gripper prototype is implemented as a reference case. Experiments demonstrate rapid embodiment transfer (median 17.6 s across user groups), low holding energy for gripper prototype ( 1.5 J per 10 s), and consistent grasp performance with cycle times of 3.2-3.9 CEGB thus provides a reproducible foundation for cross-platform, energy-aware evaluation of grippers in aerial and manipulators domains. Robotic grasping has been extensively investigated across industrial, collaborative, and aerial domains.